Goto

Collaborating Authors

 Brazzaville


Predictors of disease outbreaks at continentalscale in the African region: Insights and predictions with geospatial artificial intelligence using earth observations and routine disease surveillance data

arXiv.org Artificial Intelligence

Objectives: Our research adopts computational techniques to analyze disease outbreaks weekly over a large geographic area while maintaining local-level analysis by incorporating relevant high-spatial resolution cultural and environmental datasets. The abundance of data about disease outbreaks gives scientists an excellent opportunity to uncover patterns in disease spread and make future predictions. However, data over a sizeable geographic area quickly outpace human cognition. Our study area covers a significant portion of the African continent (about 17,885,000 km2). The data size makes computational analysis vital to assist human decision-makers. Methods: We first applied global and local spatial autocorrelation for malaria, cholera, meningitis, and yellow fever case counts. We then used machine learning to predict the weekly presence of these diseases in the second-level administrative district. Lastly, we used machine learning feature importance methods on the variables that affect spread. Results: Our spatial autocorrelation results show that geographic nearness is critical but varies in effect and space. Moreover, we identified many interesting hot and cold spots and spatial outliers. The machine learning model infers a binary class of cases or none with the best F1 score of 0.96 for malaria. Machine learning feature importance uncovered critical cultural and environmental factors affecting outbreaks and variations between diseases. Conclusions: Our study shows that data analytics and machine learning are vital to understanding and monitoring disease outbreaks locally across vast areas. The speed at which these methods produce insights can be critical during epidemics and emergencies.


Few-shot Open Relation Extraction with Gaussian Prototype and Adaptive Margin

arXiv.org Artificial Intelligence

Few-shot relation extraction with none-of-the-above (FsRE with NOTA) aims at predicting labels in few-shot scenarios with unknown classes. FsRE with NOTA is more challenging than the conventional few-shot relation extraction task, since the boundaries of unknown classes are complex and difficult to learn. Meta-learning based methods, especially prototype-based methods, are the mainstream solutions to this task. They obtain the classification boundary by learning the sample distribution of each class. However, their performance is limited because few-shot overfitting and NOTA boundary confusion lead to misclassification between known and unknown classes. To this end, we propose a novel framework based on Gaussian prototype and adaptive margin named GPAM for FsRE with NOTA, which includes three modules, semi-factual representation, GMM-prototype metric learning and decision boundary learning. The first two modules obtain better representations to solve the few-shot problem through debiased information enhancement and Gaussian space distance measurement. The third module learns more accurate classification boundaries and prototypes through adaptive margin and negative sampling. In the training procedure of GPAM, we use contrastive learning loss to comprehensively consider the effects of range and margin on the classification of known and unknown classes to ensure the model's stability and robustness. Sufficient experiments and ablations on the FewRel dataset show that GPAM surpasses previous prototype methods and achieves state-of-the-art performance.


Modelling and characterization of fine Particulate Matter dynamics in Bujumbura using low cost sensors

arXiv.org Machine Learning

Air pollution is a result of multiple sources including both natural and anthropogenic activities. The rapid urbanization of the cities such as Bujumbura economic capital of Burundi, is one of these factors. The very first characterization of the spatio-temporal variability of PM2.5 in Bujumbura and the forecasting of PM2.5 concentration have been conducted in this paper using data collected during a year, from august 2022 to august 2023, by low cost sensors installed in Bujumbura city. For each commune, an hourly, daily and seasonal analysis were carried out and the results showed that the mass concentrations of PM2.5 in the three municipalities differ from one commune to another. The average hourly and annual PM2.5 concentrations exceed the World Health Organization standards. The range is between 28.3 and 35.0 microgram/m3 . In order to make prediction of PM2.5 concentration, an investigation of RNN with Long Short Term Memory (LSTM) has been undertaken.


Contextualising Levels of Language Resourcedness affecting Digital Processing of Text

arXiv.org Artificial Intelligence

Application domains such as digital humanities and tool like chatbots involve some form of processing natural language, from digitising hardcopies to speech generation. The language of the content is typically characterised as either a low resource language (LRL) or high resource language (HRL), also known as resource-scarce and well-resourced languages, respectively. African languages have been characterized as resource-scarce languages (Bosch et al. 2007; Pretorius & Bosch 2003; Keet & Khumalo 2014) and English is by far the most well-resourced language. Varied language resources are used to develop software systems for these languages to accomplish a wide range of tasks. In this paper we argue that the dichotomous typology LRL and HRL for all languages is problematic. Through a clear understanding of language resources situated in a society, a matrix is developed that characterizes languages as Very LRL, LRL, RL, HRL and Very HRL. The characterization is based on the typology of contextual features for each category, rather than counting tools, and motivation is provided for each feature and each characterization. The contextualisation of resourcedness, with a focus on African languages in this paper, and an increased understanding of where on the scale the language used in a project is, may assist in, among others, better planning of research and implementation projects. We thus argue in this paper that the characterization of language resources within a given scale in a project is an indispensable component particularly in the context of low-resourced languages.


Flickr Africa: Examining Geo-Diversity in Large-Scale, Human-Centric Visual Data

arXiv.org Artificial Intelligence

Biases in large-scale image datasets are known to influence the performance of computer vision models as a function of geographic context. To investigate the limitations of standard Internet data collection methods in low- and middle-income countries, we analyze human-centric image geo-diversity on a massive scale using geotagged Flickr images associated with each nation in Africa. We report the quantity and content of available data with comparisons to population-matched nations in Europe as well as the distribution of data according to fine-grained intra-national wealth estimates. Temporal analyses are performed at two-year intervals to expose emerging data trends. Furthermore, we present findings for an ``othering'' phenomenon as evidenced by a substantial number of images from Africa being taken by non-local photographers. The results of our study suggest that further work is required to capture image data representative of African people and their environments and, ultimately, to improve the applicability of computer vision models in a global context.


Question Answering and Question Generation for Finnish

arXiv.org Artificial Intelligence

Recent advances in the field of language modeling have improved the state-of-the-art in question answering (QA) and question generation (QG). However, the development of modern neural models, their benchmarks, and datasets for training them has mainly focused on English. Finnish, like many other languages, faces a shortage of large QA/QG model training resources, which has prevented experimenting with state-of-the-art QA/QG fine-tuning methods. We present the first neural QA and QG models that work with Finnish. To train the models, we automatically translate the SQuAD dataset and then use normalization methods to reduce the amount of problematic data created during the translation. Using the synthetic data, together with the Finnish partition of the TyDi-QA dataset, we fine-tune several transformer-based models to both QA and QG and evaluate their performance. To the best of our knowledge, the resulting dataset is the first large-scale QA/QG resource for Finnish. This paper also sets the initial benchmarks for Finnish-language QA and QG.


AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages

arXiv.org Artificial Intelligence

In recent years, multilingual pre-trained language models have gained prominence due to their remarkable performance on numerous downstream Natural Language Processing tasks (NLP). However, pre-training these large multilingual language models requires a lot of training data, which is not available for African Languages. Active learning is a semi-supervised learning algorithm, in which a model consistently and dynamically learns to identify the most beneficial samples to train itself on, in order to achieve better optimization and performance on downstream tasks. Furthermore, active learning effectively and practically addresses real-world data scarcity. Despite all its benefits, active learning, in the context of NLP and especially multilingual language models pretraining, has received little consideration. In this paper, we present AfroLM, a multilingual language model pretrained from scratch on 23 African languages (the largest effort to date) using our novel self-active learning framework. Pretrained on a dataset significantly (14x) smaller than existing baselines, AfroLM outperforms many multilingual pretrained language models (AfriBERTa, XLMR-base, mBERT) on various NLP downstream tasks (NER, text classification, and sentiment analysis). Additional out-of-domain sentiment analysis experiments show that \textbf{AfroLM} is able to generalize well across various domains. We release the code source, and our datasets used in our framework at https://github.com/bonaventuredossou/MLM_AL.


Modelling spatio-temporal trends of air pollution in Africa

arXiv.org Artificial Intelligence

Atmospheric pollution remains one of the major public health threat worldwide with an estimated 7 millions deaths annually. In Africa, rapid urbanization and poor transport infrastructure are worsening the problem. In this paper, we have analysed spatio-temporal variations of PM2.5 across different geographical regions in Africa. The West African region remains the most affected by the high levels of pollution with a daily average of 40.856 $\mu g/m^3$ in some cities like Lagos, Abuja and Bamako. In East Africa, Uganda is reporting the highest pollution level with a daily average concentration of 56.14 $\mu g/m^3$ and 38.65 $\mu g/m^3$ for Kigali. In countries located in the central region of Africa, the highest daily average concentration of PM2.5 of 90.075 $\mu g/m^3$ was recorded in N'Djamena. We compare three data driven models in predicting future trends of pollution levels. Neural network is outperforming Gaussian processes and ARIMA models.


Africa : IDRC to catalyze the ecosystem of AI innovators through research grants - Actu IA

#artificialintelligence

In 2020, IDRC and the Swedish International Development Cooperation Agency (Sida) launched the Artificial Intelligence for Development in Africa (IAPD Africa) program. This program aims to support the AI community and policymakers in developing responsible, ethical, and equitable AI that meets the continent's challenges, under the leadership of Africa. IDRC, the International Development Research Centre, was established in Canada in 1970 with a mission "to initiate, encourage, support and conduct research into the problems of the developing regions of the world and into the application of scientific, technical and other knowledge for the economic and social advancement of those regions . IDRC sees climate change and inequality, combined with the HIV/AIDS pandemic, as major obstacles to achieving the UN's sustainable development goals, and it is these challenges that it helps to address. While the center is headquartered in Ottawa, Canada, its five regional offices are located in India, Jordan, Kenya, Senegal, and Uruguay to be as close as possible to the researchers and projects it funds.


5 LOCALLY DEVELOPED AI APPLICATIONS IN NIGERIA - DigiLaw

#artificialintelligence

While there are many training programs in Nigeria related to developing AI talent, it is usually difficult to point out AI applications that are locally developed and are already available in the marketplace. This article will hopefully be the first of a series of articles; I will be uncovering several locally developed AI applications in Nigeria. Some you might know, some you won't. All in all, my goal in this series is to dispel beliefs that AI is an imported technology in Nigeria. Nigeria may not be pulling its weight compared to Ghana, South Africa, and Kenya in the African AI space, but there are notable strides that we should be aware of.